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Abstract — In this paper a method is proposed to discriminate 
natural and manmade scenes of similar depth. Increase in 
image depth leads to increase in roughness in manmade 
scenes; on the contrary natural scenes exhibit smooth behavior 
at higher image depth. This particular arrangement of pixels 
in scene structure can be well explained by local texture 
information in a pixel and its neighborhood. Our proposed 
method analyses local texture information of a scene image 
using texture unit matrix. For final classification we have 
used unsupervised learning using Self Organizing Map 
(SOM). This technique is useful for online classification due 
to very less computational complexity. 

Index Terms - Image-depth, Texture unit, Texture unit matrix, 
scene image, Self Organizing Map (SOM) 

I. Introduction 

Natural and manmade scene images exhibit a peculiar 
behavior with respect to variation in image depth, where depth 
of an image is the mean distance of the object from the viewer. 
Near manmade structures exhibit a homogenous and smooth 
view as shown in figure- 1 (b). With increase in depth, 
smoothness of manmade image decreases because of 
inclusion of other artifacts. On the other hand 'near' natural 
scene is perceived as a textured region where roughness is 
high viz. figure- 1(a). In 'far' natural scenes textured regions 
get replaced by low spatial frequency [9] components and 
give an appearance of smoothness. Such attributes of scene 
images can be perceived as follows: 'far' manmade and 'near' 
natural structures exhibit similar rough appearance and 'near' 
manmade and 'far' natural scenes exhibit smooth view and 
this textural difference can be explored to discriminate natural 
scenes and manmade scenes of similar depth. 




(a) (b) 
Figure 1. (a) Rough Image (b) Smooth Image 

Texture can be described as a repetitive pattern of local 
variations in image intensity. In a scene image, texture provides 
measures of some scene attributes like smoothness, 
coarseness and regularity. These extracted features constitute 
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the feature vector which is analyzed further to classify scene 
images [1]. He and Wang [2] have proposed a statistical 
approach to texture analysis termed as 'Texture unit approach' 
which is used in our paper to distinguish between manmade 
and natural scenes. Here local texture information of a given 
pixel and neighborhood is characterized by 'Texture unit' and 
the unit is used further to quantify texture and to construct 
the feature vector. A SOM classifier is then used to give the 
final classification result. This is shown in the following block 
diagram (Fig 2). 
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Figure 2 Block Diagram 

Serrano et al [3] have proposed a method of scene classi- 
fication where in the first level low-level feature sets such as 
color and wavelet texture features are used to predict mul- 
tiple semantic scenes attributes and they are classified using 
support vector machine to obtain indoor/outdoor classifica- 
tion about 89%. In next level, the semantic scene attributes 
are then again integrated using a Bayesian network, and an 
improvised indoor/outdoor scene classification result of 
90.7% is obtained. Raja et al [4] have proposed a method to 
classify the war scene category from the natural scene cat- 
egory. They have extracted Wavelet features from the im- 
ages and feature vector is trained and tested using feed for- 
ward back propagation algorithm using artificial neural net- 
works and have reported classification success is 82%. Us- 
ing the same database, Raja et al [5] have extracted features 
from images using Invariant Moments and Gray Level Co- 
occurrence Matrix (GLCM). They have reported that GLCM 
feature extraction method with Support Vector Machines clas- 
sifier has shown result up to 92%. Chen et al [6] have pro- 
posed a scene classification technique where they have con- 
sidered texture Unit Coding (TUC) concept to classify 
mammograms. The TUC generates a texture spectrum for a 
texture image and the discrepancy between two texture spec- 
tra is measured using information divergence (ID)-based dis- 
crimination criterion. They applied TUC along with ID classi- 
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fication mass in mammograms. Barcelo et al. [7] have pro- 
posed a texture characterization approach that uses the tex- 
ture spectrum method and fuzzy techniques for defining 'tex- 
ture unit boxes' which also takes care of vagueness intro- 
duced by noise and the different caption and digitations pro- 
cesses. Karkanis et al.[8] computed features based on the run 
length of the spectrum image representing textural descrip- 
tors of respective regions. They have characterized different 
textured regions within the same image, which is further ap- 
plied successfully on endoscopic images for classifying be- 
tween normal and cancer regions. Bhattacharya et al. [9] have 
used Texture spectrum concepts using 3x3 as well as 5x5 win- 
dow for reduction in noise in satellite data. Al-Janobi[ 10] have 
proposed texture analysis method incorporating with the prop- 
erties of both the gray-level co-occurrence matrix (GLCM) 
and texture spectrum (TS) methods. They have obtained Im- 
age texture information of an image using the method and 
they have worked on Brodatz's natural texture images. Chang 
et al. [ 1 1 ] have extended Texture Unit Coding (TUC) and pro- 
posed gradient texture unit coding (GTUC) where gradient 
changes in gray levels between the central pixel and its two 
neighboring pixels in a Texture unit (two pixels considered in 
the TUC), along with two different orientations is captured. 
Jiji et al.[12] proposed a method for segmentation of color 
texture image using fuzzy texture unit and color fuzzy texture 
spectrum. After locating color texture locally as well as glo- 
bally segmentation operation is performed by SOMs algo- 
rithm. Rath et al. [13] have proposed a Gabor filter based 
scheme to segregate monocular scene images of real world 
natural scenes from manmade structures. Lee et al.[14] pro- 
posed a method for texture analysis using fuzzy uncertainty. 
They have introduced fuzzy uncertainty texture spectrum 
(FUTS), and it used as the texture feature for texture analysis. 
He Wang [15] have simplified the texture spectrum by reduc- 
ing the 6,561 texture units into 15 units without significant 
loss of discriminating power. They have corroborated their 
claim by doing experimentation on Brodatz's natural texture 
images. 

In this paper, a method is proposed by us where images 
of similar depth are classified to manmade and natural classes. 
In first stage of experiment scene images are converted to 
texture unit matrices and then feature vectors are generated 
from these matrices. In second stage, the feature vectors are 
subjected to SOM and classified results are obtained 
respectively. So in our method 'near ' and 'far ' scene images 
are getting classified to 'natural' and 'manmade' classes 
separately. 

Brief outline of the paper is as follows. Section-II 
discusses the basic concepts of Texture unit and its extended 
versions like base-5 and base-7. This is followed by explanation 
of ordering way in texture unit. Classifier used in our work, 
Self organizing map is explained briefly in following section. 
Section-Ill describes our experimental algorithm and section- 
IV presents elaborate discussion on experiments and results 
of the work. This paper is concluded in section-V discussing 
about possible implementation of our technique as a real time 
application. 
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II. Texture Unit Approach 

He and Wang [1] have proposed a statistical approach to 
texture analysis termed as texture unit approach. Here local 
texture information for a given pixel and its neighborhood is 
characterized by the corresponding texture unit. It extracts 
the textural information of an image as it takes care of all the 
eight directions corresponding to its eight neighbors. In this 
work a neighborhood comprises of a 3x3 window taking the 
central pixel as image pixel. 

A. Base3 texture unit number 

In Texture unit approach a texture image can be 
decomposed into a set of essential small units called texture 
units. The neighborhood of 3x3 pixels which is denoted by a 
set V, comprising of nine elements: 

V = { V ,V 1 ,V 2 ,V 3 ,V 4 ,V 5 ,V 6 ,V 7 ,V 8 } , where V . intensity value of 
the central pixel 

Vj-V 8 : intensity values of the neighboring pixels represented 
as V.; i = 1,2,3... 8 

Then the corresponding texture unit can be represented as a 
set containing the elements, 

TU = { E 1 , E2, . . . ,E8 } , where the elements of the texture unit 
Ei; i=l,2. . ..8 are computed as follows: 



E; ={ 



if V; > v - A and v ; < v o - A 

1 if V; <v 

2 if v ; > v„ 



(l) 



A = gray level tolerance limit. 

Gray level tolerance limit is taken to obtain a distinguished 
response for textured and non textured region separately 
and this value is kept very small. 

The intensity values V. of 3x3 window are now replaced by 
the corresponding Ei. The TUN Basc3 ranges from to 6560. 
The texture unit number in base 3 is calculated as 
follows: 



Ntubase3 — 3 — EjX 3 + E 2 x 3 + E 3 x 3 + 



E 4 x 3 3 + E 5 x 3 4 + E b x 3 5 + E 7 x 3 6 + E 8 x 3 7 



(2) 



Where, 

N TUBase3 : texture unit number with respect to Base-3. 
E: i th element of texture unit set. 
TU: {E1,E2„...,E8} 

B. Base-5 Texture Unit Matrix (TUM B ) 

The Base3 approach of texture units is unable to 
discriminate the differences from less or far-less and greater 
or far-greater with respect to the grey level value of central 
pixel. To incorporate this type of texture feature on a 3 x 3 
window TUM Basc5 and TUM Basc7 approaches are proposed [16]. 
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if v ; > v G - A and v ; < v G + A 

1 if V; < v o and v ; < X 
E : =\2\£ v i <v andv i >X 

3 if V; > v and v ; < Y 
4 if V; > v o and v ; > Y 

Where x, y are user-specified threshold limits. 
A = gray level tolerance limit. 



(3) 



Vi < X 
Ei= 1 



Vi>X 
Ei = 2 



I Vi < Y 
I Ei =3 



Ei = 4 



X Vo -A Vo Vo +A Y 

I I 

Ei = 

Figure-3. Base-5 Texture unit representation 

Fig. 3 is explaining the base-5 approach as per (3). The 
corresponding texture unit can be represented as a set 
containing eight elements, TU = {El, E2, ... E8}. In this 
approach, (3) is used to determine the elements Ei of texture 
unit and Texture Unit number is computed using (4). In Fig. 
4(a) 3x3 window is taken where central image pixel value is 
140. Using (3) Texture unit is generated corresponding to 
each neighborhood pixel value; shown in Fig. 4(b). Then an 
ordering way is chosen as discussed in sec[D] and Texture 
unit number is calculated. The TUN . ranges from to 2020. 
Minimum TUN Basc5 value is obtained by keeping all Ei values 
in (4) and maximum TUN 2020 is obtained by keeping all 
Ei values 4 in (4). 



N7UNBASE5 — ^ Ej 3 — Ejjc 3 + E 2 jc 3 + E 3 jc 3 + 

i=l 

E 4 x 3 L5 + E 5 jc 3 2 + E 6 jc 3 25 + E 7 x 3 3 + E 8 jc 3 35 



(4) 



C. Base-7 Texture Unit Matrix (TUM ' J 

Similarly Base-7 approach of texture unit is proposed [16] 
where two threshold limits are taken. The range of TUN , 

° Basc7 

varies from to 1 172. 



if V; > v D - A and v ; < v D + A 

1 if V; < v D and v i < X, and v ; < Y, 

2 if V; < v D and v ; > X, and v ; < Y, 

3 if V; < v o and v ; > X, and v ; > Y, 

4 if V; > v o and v ; < X u and v i < Y u 

5 if V; > v o and v ; > X u and v i < Y u 
6 if V; > v o and w i > X u and w i > Y u 

Where X„ Y„ X , Y are user defined threshold limits. 

r 1 u' u 

©2013ACEEE 
rX)I:03.LSCS.2013.3.78 



(5) 



A = gray level tolerance limit. 

Texture unit number in base-7 is computed as: 



TUN 



BASE5 



:XE ; X7 ( 



i-l)/3 



(6) 
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[1 20 4 40 1 3] -*■ 1114 

4. Base-5 technique (a) 3x3 window (b) Texture unit (c) 
Texture unit to Texture unit Number 



D. Ordering way of Texture Unit number 

Ordering way of a texture unit is to arrange the texture 
unit for a 3x3 window; fig 4(b) (comprising of 8 elements for a 
3x3 window fig 4(a)). This box may be arranged in maximum 8 
possible ways starting from each individual element giving 
rise to 8 possible ordering ways for a window. Thus any 
window will provide 8 texture unit numbers for an image pixel. 
In fig. 5, an example of a texture unit is given along with 8 
possible ordering ways and their corresponding TUN. 



[ 1 204 40 1 3] -+11 14 
[2044013 1]-* 777 
[ 4 4 1 3 1 2] 906 
[440 1 3 1 20] 405 
[4 1 3 1 2 04]-* 1297 
[0 13 1 2044]-* 1696 
[ 1 3 1 20440]-* 759 
[3 1 20440 1]-* 61S 
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Figure 5. Example showing 8 ordering ways and 8 Texture unit 
numbers 

Fig. 6 displays a synthetic texture image and its Texture 
unit matrices in base-5 for 8 ordering ways. 

E. Self Organizing Map 

Self-organizing feature maps (SOFM) learn to classify 
input vectors according to how they are grouped in the input 
space in an unsupervised way [17]. When input is presented 
to the first layer, it computes the distance between input 
vector and weights associated with the neurons. In an 
iteration the distances from each input are compared using 
compete transfer function at the output layer and winner 
neuron is decided. Winning neuron gets value 'one' and 
others get 'zero' . Weight of winner neuron is updated using 
Kohonen rule and subjected to next iteration. In this way 
specified number of iterations is performed and final 
classification result is obtained. 
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® <&> m ® 

Fig. 6. (a) Texture image (b) - (h) Texture unit matrix for base-5 
ordering way 1 to 8 

EI. Texture Unit Approach 

In this algorithm we have used base-5 technique and the 
steps to compute the feature vector is as follows: 

150 Scene images of size 128x128 are considered. 
3x3 Window is chosen taking each image-pixel as 
its central pixel. 

To process the border pixels, image is padded with 2 
rows and 2 columns making its size 130x130. 
8 texture units (TU) for all possible ordering ways 
are generated with respect to each window using 
(3). 

Each Texture Unit (TU) is then converted to Texture 
unit number (TUN) using (4). Thus each image pixel 
corresponds to 8 TUNs. 

Above process is repeated for entire image pixels. 
Each image pixel is replaced by its corresponding 
TUN, which results in 8 Texture unit matrices of size 
128x128. 

Feature matrix (128x128) is obtained by taking pixel 
wise minimum of above 8 TUN matrices. 
Then feature matrix is down sampled to 8 x 8 matrix 
and resized into 64 x 1 column vector representing 
the feature vector of an image. 

IV. Experimental Result And Analysis 

Images of manmade and natural scenes are obtained from 
various sources of scene images. Sample images are shown 
in fig. 10 and fig. 1 1 from all the four classes; viz. manmade 
near (row-1) and natural near scenes (Figure- 10), manmade 
far and natural far (Figure- 11). It has been observed that 
'manmade 'near' images are smoother than 'manmade 'far' 
images. Similarly natural far' images appear smoother than 
natural 'near' images. So we have segregated our database 
of 300 images into two databases (each 150 images) such as 
'near' image database and far' image database which is ac- 
complished according to human perception. For 'near' image 
database we have considered natural scene images of bush, 
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leaves etc. and manmade scenes like toys, house interior as 
'near' scene images having depth within 10 meters. Similarly 
for far' image database we have considered natural scene 
images of open field, panoramic views, mountains and 
manmade scenes of inside city views, tall building and urban 
views having depth about 500 meters. In this experiment SOM 
classifier classifies scene images to manmade and natural cat- 
egory when it is provided with input scenes of similar depth. 

Images taken in this paper are of size 128 x 128. We have 
experimented on base-3 base-5 and base-7 methods of TUN 
and inferred that pixel wise minimum of the Texture unit 
matrices corresponding to all ordering ways gives best 
results. Table- 1 exhibits performances of minimum TUN matrix 
in base-3, base-5, base-7 and it is revealed that Base -5 method 
provides minimum number of misclassifications in both 'near' 
and far' image database. 

In an image, each image pixel is taken as central pixel and 
a 3X3 window is selected around that pixel. We found that a 
3X3 window captures texture variations better than large 
window size. In each window all angle orientations (0°, 45° , 
90°, 135°, 180°, 225°, 270°, 325°) are considered as eight 
neighbors. Then texture unit array is generated using (3), 
where we have chosen grey level tolerance limit A = 5. Eight 
texture units (TU) are obtained for a window corresponding 
to 8 possible ordering ways. Then each TU is converted to 
its corresponding TUN, which replaces the central image pixel. 
Thus one image pixel gives rise to eight TUNs and 
subsequently the entire image gives rise to 8 TUN matrices 
of size 128X128 which is same as image size. Pixel wise 
minimum of these eight matrices is obtained which constitutes 
the feature matrix of size 128x128. Feature matrix is down 
sampled to 8x8 matrix and resized to 64x1 column vectors 
which is the feature vector of an image. The process is repeated 
to produce 150 (64-dimensonal) feature vectors for each 
database. These feature vectors are then subjected to a SOM 
classifier. Output responses (Base-5 Texture unit matrix) of 
some scene images are displayed in fig-7. 

In our experiment an unsupervised classification 
technique is used. We have employed Self organizing Map 
(SOM) classifier to classify the scene images. To classify 
the 'near' scenes, 150 (64 dimensional) feature vectors are 
given as inputs to SOM. The number of output classes is to 
two (manmade and natural). The network is trained for 500 
iterations as we have found that higher number of iterations 
does not improve the result. Result obtained in this 
classification is 98% for 'near' database. Similarly for 'far 
scenes we obtained a classification result of 96%. Number of 
misclassifications is found to be 3 for 'near' database and 6 
for far' database where depth of the scene is found to be 
ambiguous. 

As shown in fig. -8, fig. -9, graph is plotted between TUN 
and image number. TUN values for 50 'near ' images (25 from 
each category) are plotted in figure 8(a, b, c) for base-3, base- 
5 and base-7 respectively. Manmade 'near ' images are shown 
with star marker and natural 'near' images are shown with 
pentagon marker. From the graphs it is observed that in 'near ' 
image database inter class gap between manmade 'near' and 
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Natural 'near' is very distinct in base-3 (Fig. 8. a) technique 
than that of base-5 (Fig. 8.b) and base-7 (Fig. 8.c). In 'far' 
image database similar behavior is noticed. Base-3(Fig. 9. a) 
shows better inter class gap in comparison to that of base-5 
(Fig. 9.b) and base-7 (Figure 9.c) 

The numbers of misclassification in base-3, base-5, and 
base-7 methods are tabulated in Table- 1 for 'near' image and 
'far' image databases. It is found that number of 
misclassifications found in case of base-5 is less in comparison 
to base-3 and base-7 in SOM classifier. Therefore we have 
implemented base-5 approach to compute the TUN. 
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Figure 7 Texture unit matrix outputs (b) of some specimen images 
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Table. I. Comparison of Base-3, Base-5, Base-7 approach 
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Figure- 10 Sample Images belonging to 2 classes; 'near' natural 
scene (Row-l);'near' manmade scene (Row-2) 

Conclusions 

In this work we have proposed a method where the textural 
information of a scene image is captured by Texture unit matrix 
to analyze the scene images. We found that obtaining the 
Texture unit matrix with respect to base-5 and taking pixel 
wise minimum of all ordering ways, produced the best result 
i.e. 98% for near scene image database and 96% for far scene 
image database. This method may be utilized for automated 
classification of scene images irrespective of depth. 





Figure- 11 Sample Images belonging to 2 classes; 'far' Natural scene 
(Row-l);'/ar' manmade (Row-2) 
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